Machine-Aware Atomic Broadcast Trees for Multicores
نویسندگان
چکیده
The performance of parallel programs on multicore machines often critically depends on group communication operations like barriers and reductions being highly tuned to hardware, a task requiring considerable developer skill. Smelt is a library that automatically builds efficient inter-core broadcast trees tuned to individual machines, using a machine model derived from hardware registers plus micro-benchmarks capturing the low-level machine characteristics missing from vendor specifications. Experiments on a wide variety of multicore machines show that near-optimal tree topologies and communication patterns are highly machine-dependent, but can nevertheless be derived by Smelt and often further improve performance over well-known static topologies. Furthermore, we show that the broadcast trees built by Smelt can be the basis for complex group operations like global barriers or state machine replication, and that the hardware-tuning provided by the underlying tree is sufficient to deliver as good or better performance than stateof-the-art approaches: the higher-level operations require no further hardware optimization.
منابع مشابه
Smartlocks: Self-Aware Synchronization through Lock Acquisition Scheduling
As multicore processors become increasingly prevalent, system complexity is skyrocketing. The advent of the asymmetric multicore compounds this – it is no longer practical for an average programmer to balance the system constraints associated with today’s multicores and worry about new problems like asymmetric partitioning and thread interference. Adaptive, or self-aware, computing has been pro...
متن کاملComparative Performance Analysis of Ordering Strategies in Atomic Broadcast Algorithms
In this paper, we present the results of a comparative analysis of Atomic Broadcast algorithms. The analysis was done by using an analytical method to compare the performance of five different classes of Atomic Broadcast algorithms. The five classes of Atomic Broadcast algorithms are determined by the mechanisms used by the algorithms to define the delivery order. To evaluate the performance of...
متن کاملRing Paxos: High-Throughput Atomic Broadcast†
Atomic broadcast is an important communication primitive often used to implement state-machine replication. Despite the large number of atomic broadcast algorithms proposed in the literature, few papers have discussed how to turn these algorithms into efficient executable protocols. This paper focuses on a class of atomic broadcast algorithms based on Paxos, with its corresponding desirable pro...
متن کاملPartial Replication in the Database State Machine
This paper investigates the use of partial replication in the Database State Machine approach introduced earlier for fully replicated databases. It builds on the order and atomicity properties of group communication primitives to achieve strong consistency and proposes two new abstractions: Resilient Atomic Commit and Fast Atomic Broadcast. Even with atomic broadcast, partial replication requir...
متن کاملCollective Operations for Wide-Area Message Passing Systems Using Dynamically Created Spanning Trees
We propose a configuration-free method to perform collective operations efficiently in dynamically changing topologies. Our collective operations are designed so that (1) they perform well when the topology is stable, (2) they complete successfully even when processors join or leave, and (3) they adapt to topology changes. We propose to create adaptive latency-aware spanning trees for short mes...
متن کامل